培训大型神经网络架构的快速增长带来了对划分策略的需要,例如通过使用数据,模型或管道并行性。通过程序基元越来越多地支持这些方法,但识别有效的分区策略需要昂贵的实验和专业知识。我们介绍了自动分区器的原型,它无缝集成到现有的编译器和现有用户工作流中。我们的分区使SPMD风格的并行性能够包含数据并行性和参数/激活分片。通过归纳策略和在平台独立的分区IR中搜索的组合,Automap可以恢复用于变压器层的专家分区策略,如Megatron分片。
translated by 谷歌翻译
Deep learning techniques with neural networks have been used effectively in computational fluid dynamics (CFD) to obtain solutions to nonlinear differential equations. This paper presents a physics-informed neural network (PINN) approach to solve the Blasius function. This method eliminates the process of changing the non-linear differential equation to an initial value problem. Also, it tackles the convergence issue arising in the conventional series solution. It is seen that this method produces results that are at par with the numerical and conventional methods. The solution is extended to the negative axis to show that PINNs capture the singularity of the function at $\eta=-5.69$
translated by 谷歌翻译
Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system's performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization - that can help pre-trained language models to better understand dialogues and improve their performance on important dialogue comprehension tasks. We further collect human annotations of STRUDEL summaries over 400 dialogues and introduce a new STRUDEL dialogue comprehension modeling framework that integrates STRUDEL into a graph-neural-network-based dialogue reasoning module over transformer encoder language models to improve their dialogue comprehension abilities. In our empirical experiments on two important downstream dialogue comprehension tasks - dialogue question answering and dialogue response prediction - we show that our STRUDEL dialogue comprehension model can significantly improve the dialogue comprehension performance of transformer encoder language models.
translated by 谷歌翻译
A popular approach to creating a zero-shot cross-language retrieval model is to substitute a monolingual pretrained language model in the retrieval model with a multilingual pretrained language model such as Multilingual BERT. This multilingual model is fined-tuned to the retrieval task with monolingual data such as English MS MARCO using the same training recipe as the monolingual retrieval model used. However, such transferred models suffer from mismatches in the languages of the input text during training and inference. In this work, we propose transferring monolingual retrieval models using adapters, a parameter-efficient component for a transformer network. By adding adapters pretrained on language tasks for a specific language with task-specific adapters, prior work has shown that the adapter-enhanced models perform better than fine-tuning the entire model when transferring across languages in various NLP tasks. By constructing dense retrieval models with adapters, we show that models trained with monolingual data are more effective than fine-tuning the entire model when transferring to a Cross Language Information Retrieval (CLIR) setting. However, we found that the prior suggestion of replacing the language adapters to match the target language at inference time is suboptimal for dense retrieval models. We provide an in-depth analysis of this discrepancy between other cross-language NLP tasks and CLIR.
translated by 谷歌翻译
Conditional diffusion probabilistic models can model the distribution of natural images and can generate diverse and realistic samples based on given conditions. However, oftentimes their results can be unrealistic with observable color shifts and textures. We believe that this issue results from the divergence between the probabilistic distribution learned by the model and the distribution of natural images. The delicate conditions gradually enlarge the divergence during each sampling timestep. To address this issue, we introduce a new method that brings the predicted samples to the training data manifold using a pretrained unconditional diffusion model. The unconditional model acts as a regularizer and reduces the divergence introduced by the conditional model at each sampling step. We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks. The improvements obtained by our method suggest that the priors can be incorporated as a general plugin for improving conditional diffusion models.
translated by 谷歌翻译
Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html
translated by 谷歌翻译
Recently, Robey et al. propose a notion of probabilistic robustness, which, at a high-level, requires a classifier to be robust to most but not all perturbations. They show that for certain hypothesis classes where proper learning under worst-case robustness is \textit{not} possible, proper learning under probabilistic robustness \textit{is} possible with sample complexity exponentially smaller than in the worst-case robustness setting. This motivates the question of whether proper learning under probabilistic robustness is always possible. In this paper, we show that this is \textit{not} the case. We exhibit examples of hypothesis classes $\mathcal{H}$ with finite VC dimension that are \textit{not} probabilistically robustly PAC learnable with \textit{any} proper learning rule. However, if we compare the output of the learner to the best hypothesis for a slightly \textit{stronger} level of probabilistic robustness, we show that not only is proper learning \textit{always} possible, but it is possible via empirical risk minimization.
translated by 谷歌翻译
Autonomous driving requires efficient reasoning about the location and appearance of the different agents in the scene, which aids in downstream tasks such as object detection, object tracking, and path planning. The past few years have witnessed a surge in approaches that combine the different taskbased modules of the classic self-driving stack into an End-toEnd(E2E) trainable learning system. These approaches replace perception, prediction, and sensor fusion modules with a single contiguous module with shared latent space embedding, from which one extracts a human-interpretable representation of the scene. One of the most popular representations is the Birds-eye View (BEV), which expresses the location of different traffic participants in the ego vehicle frame from a top-down view. However, a BEV does not capture the chromatic appearance information of the participants. To overcome this limitation, we propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV). We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene, which can aid in downstream tasks such as object tracking and executing language-based commands. We test the efficacy of our approach on synthetic dataset generated from CARLA. The code, data set, and results can be found at https://rebrand.ly/APP OCC-results.
translated by 谷歌翻译
The inception of large language models has helped advance state-of-the-art performance on numerous natural language tasks. This has also opened the door for the development of foundation models for other domains and data modalities such as images, code, and music. In this paper, we argue that business process data representations have unique characteristics that warrant the development of a new class of foundation models to handle tasks like process mining, optimization, and decision making. These models should also tackle the unique challenges of applying AI to business processes which include data scarcity, multi-modal representations, domain specific terminology, and privacy concerns.
translated by 谷歌翻译
Observational studies have recently received significant attention from the machine learning community due to the increasingly available non-experimental observational data and the limitations of the experimental studies, such as considerable cost, impracticality, small and less representative sample sizes, etc. In observational studies, de-confounding is a fundamental problem of individualised treatment effects (ITE) estimation. This paper proposes disentangled representations with adversarial training to selectively balance the confounders in the binary treatment setting for the ITE estimation. The adversarial training of treatment policy selectively encourages treatment-agnostic balanced representations for the confounders and helps to estimate the ITE in the observational studies via counterfactual inference. Empirical results on synthetic and real-world datasets, with varying degrees of confounding, prove that our proposed approach improves the state-of-the-art methods in achieving lower error in the ITE estimation.
translated by 谷歌翻译